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Abstract: Wc consider the problem of estimating the error variance in a 
general linear model when the error distribution is assumed to be spherically 
symmetric, but not necessary Gaussian. In particular we study the case of 
a scale mixture of Gaussians including the particularly important case of 
the multivariate-t distribution. Under Stein's loss, we construct a class of 
estimators that improve on the usual best unbiased (and best equivariant) 
estimator. Our class has the interesting double robustness property of be- 
ing simultaneously generalized Bayes (for the same generalized prior) and 
minimax over the entire class of scale mixture of Gaussian distributions. 
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1. Introduction 

Suppose the linear regression model is used to relate y to the p predictors 
X 1 , . . . , Xp , 

y = aln + Xl3 + ae (1.1) 

where a is an unknown intercept parameter, 1„ is an n x 1 vector of ones, 
X = (a;i, . . . ,Xp) is an n X p design matrix, and /3 is a p x 1 vector of un- 
known regression coefficients. In the error term, a is an unknown scalar and 
e = (ei, . . . , e„)' has a spherically symmetric distribution, 

e^fie'e) (1.2) 

where /(•) is the probability density, E[e] = 0„, and Var[e] = 7„. We assume 
that the columns of X have been centered so that a;^l„ = for 1 < i < p. We 
also assume that n > p + 1 and {jci, . . . , Xp} are linearly independent, which 
implies that 

rankX = p. 

The class of error distributions we study includes the class of (spherical) multivariate- 
t distributions, probably the most important of the possible alternative error 
distributions. It is often felt in practice that the error distribution has heavier 
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tails than the normal and the class of multivariate-t distributions is a flexible 
class that allows for this possibility. They are also contained in the class of scale 
mixture of normal distributions and thus, by De Finetti's Theorem, represent 
exchangeable distributions regardless of the sample size n. 

In this paper we consider estimation of = E[{aei}'^], the variance of each 
component of error term, under Stein's loss (See James and Stein (1961)), 

Ls{S,a^) = S/a^-\og{S/a^)-l. (1.3) 

Hence the risk function R{{a, (3, a^}, S) is given by E[Ls{S, cr^)]. The best equiv- 
ariant estimator is the unbiased estimator given by 

r RSS , 
5u = (1.4) 

n — p — 1 

where RSS is Residual Sum of Squares given by 

RSS = 11(7 - X{X'X)-'X'){y - . 

In the Gaussian case, the Stein effect in the variance estimation problem has 

been studied in many papers including Stein (1964); Strawderman (1974); Brewster and Zidek 

(1974); Maruyama and Strawderman (2006). Stein (1964) showed that 

6 = mm I (5(7, I (1.5) 

dominates 5u- For smooth (generalized Bayes) estimators, Brewster and Zidek 
(1974) gave the improved estimator 

where (fi^^ {■) is a smooth increasing function given by 



;BZ 



and R is the coefficient of determination given by 

2 \\X{X'X)-^X'{y-ylnW 

Maruyama and Strawderman (2006) proposed another class of improved gen- 
eralized Bayes estimators. The proofs in all of these papers seem to depend 
strongly on the normality assumption. So it seems then, that it may be difficult 
or impossible to extend the dominance results to the non-normal case. Also many 
statisticians have thought that estimation of variance is more sensitive to the as- 
sumption of error distribution compared to estimation of the mean vector, where 
some robustness results have been derived by Maruyama and Strawderman (2005). 
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Note that wc use the term "robustness" in this sense of distributional robust- 
ness over the class of spherically symmetric error distributions. We specifically 
are not using the term to indicate a high breakdown point. The use of the 
term "robustness" in our sense is however common (if somewhat misleading) in 
the context of insensitivity to the error distribution in the context of shrinkage 
literature. 

In this paper, we derive a class of generalized Bayes estimators relative to a 
class of separable priors of the form 7r(a, /3){(t^}~^ and show that the result- 
ing generalized Bayes estimator is independent of the form of the (spherically 
symmetric) sampling distribution. Additionally, we show, for a particular sub- 
class of these separable priors, (/3'X'X/3)^(P^^)/^{cr^}~\ that the resulting 
robust generalized Bayes estimator has the additional robustness property of 
being minimax and dominating the unbiased estimator 5u simultaneously, for 
the entire class of scale mixture of Gaussians. 

A similar (but somewhat stronger) robustness property has been studied 
in the context of estimation of the vector of regression parameters (a, /3) by 
Maruyama and Strawderman (2005). They gave separable priors of a form sim- 
ilar to priors in this paper for which the generalized Bayes estimators are min- 
imax for the entire class of spherically symmetric distributions (and not just 
scale mixture of normals). We suspect that the distributional robustness prop- 
erty of the present paper also extends well beyond the class of scale mixture 
of normal distributions but have not been able to demonstrate just how much 
further it does extend. 

Our class of improved estimators utilizes the coefficient of determination in 
making a (smooth) choice between Su (when i?^ is large) and \\y — j/l„|p/(n — 
1) (when is small) and reflects the relatively common knowledge among 
statisticians, that Sjj = RSS/(n— p— 1) overestimates cr^ when R^ is small. See 
Remark 3.1 for details. 

The organization of this paper is as follows. In Section 2 we derive generalized 
Bayes estimators under separable priors and demonstrate that the resulting 
estimator is independent of the (spherically symmetric) sampling density. In 
Section 3 we show that a certain subclass of estimators which are minimax 
under normality remains minimax for the entire class of scale mixture of normals. 
Further, we show that certain generalized Bayes estimators studied in Section 
2 have this (double) robustness property. Some comments are given in Section 
4 and an appendix gives proofs of certain of the results. 

2. A generalized Bayes estimator with respect to the harmonic prior 

In this section, we show that the generalized Bayes estimator of the variance 
with respect to a certain class of priors is independent of the particular sampling 
model under Stein's loss. Also we will give an exact form of this estimator for 
a particular subclass of "(super)harmonic" priors that, we will later show, is 
minimax for a large subclass of spherically symmetric error distributions. 



Y, Maruyama and W. Strawderman/Improved robust Bayes Estimators 4 

Theorem 2.1. The generalized Bayes estimator with respect to TT{a, (3, a'^) — 
7r(a, /3){ct2|^i under Stein's loss (1.3) is independent of the particular spher- 
ically symmetric sampling model and hence is given by the generalized Bayes 
estimator under the Gaussian distribution. 

Proof. Sec Appendix. □ 

Now let p > 3 and 7r(a,/3) = {f3' X' X (By'^P-''^^ . This is related to a family 
of (super)harmonic functions as follows. If, in the above joint prior for (a, /3), we 
make the change of variables, 6 = {X' Xy^^ f3, the joint prior of (a, 6) becomes 

7r(a,0) = ||6>||-(''-''). (2.1) 

The Laplacian of is given by 

i—l * 

which is negative (i.e. super-harmonic) for 2 < a < p and is zero (i.e. harmonic) 
for a = 2. 

Theorem 2.2. Under the model (1.1) with spherically symmetric error distri- 
bution (1.2) and Stein's loss (1.3), the generalized Bayes estimator with respect 
to 7r(a,/3,cr2) = {(3'X'X(3)-^P-''^/^{a'^}-^ for < a < p is given by 

n — p — 1 

where 

71 - a - 1 ip/2-a/2-l(l _ i)a/2-l(l _ ^2i)(n-p-a+l)/2 

Proof. Sec Appendix. □ 



3. Minimaxity 

In this section, we demonstrate robustness of minimaxity under scale mixture 
of normals for a class of estimators which arc minimax under normality. 

Theorem 3.1. Assume 6^ = (j){R'^){RSS/{n - p — 1)} where (/>(•) is monotone 
nondecreasing, improves on the unbiased estimator, Sjj , under normality and 
Stein's loss. Then 5^ also improves on the unbiased estimator, Su , under scale 
mixture of normals and Stein's loss. 

Proof. Let / be a scale mixture of normals where the scalar r satisfies -^[t^] = 1, 
that is, 

/(t)= / (2^T)-"/2cxp(-t/{2r2})g(r2)dr2. 
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The risk difference between these estimators is given by 

A^R {{a, f3, a^}, 6u) - R {{a, /3, a^}, ,5, 

"{1 RSS 



= E 



n — p — 1 



\ogcj){R' 



For given t^, {||?;-yl„||2-RSS}/r2(= U) and RSS/r2(= V) are independently 
distributed as Xp(-^/'''^) with A = f3'X'Xf3 and Xn-p-i- Since i?^ is given by 
1 - RSS/||y - ylnf = (1 + V/U)-^, we have 



E E[E[l-(l3{{l + V/Uy'^)\T,V]T^\V] 
+ E[\0gq^{{l + V/U}-')] . 



V 



{n-p- 1)E[t^] 



Now note that by the monotone hkehhood ratio property of non-central i we 
have the following lemma. 

Lemma 3.1. E[iP{x^{X/t)\t] with X > is decreasing in t if ip is increasing. 
By Lemma 3.1 and the covariance inequality, 

E[E [l-q^{{l + V/U}-')\T, V] t'\V] 

> e[t^]e [1 - 0({i + v/uy^)\v] . 



Hence we get 



A> E 



{l^m + V/U}-^)] 



V 



\og^{{l + V/U]-^] 



n — p — 1 

{Rg ({a, f3, T^}, Su) - Rg {{a, f3, r^}, S^) } giT^)d7 



>0, 



where Rg is the risk function under the Gaussian assumption. 



□ 



Under the normality assumption, Brewster and Zidek (1974) showed that the 
estimator (j)(R'^)6u with nondecreasing 4> dominates the unbiased estimator Sjj 
if (j)^^ < (/^^ ^ 1, where (j)^^ is given by (1.6). Maruyama and Strawdcrman 
(2006) demonstrated that the generalized Bayes estimator of Theorem 2.2 with 
a ~ 2 satisfies this condition. Hence our main result shows that the generalized 
Bayes estimator of Theorem 2.2 with a = 2, is minimax for the entire class of 
variance mixture of normal distributions. 



Theorem 3.2. Let 



1 > P > 3. Under Stein's loss, the estimator given by 
RSS 



S" = 0^(i?2) 



n — p — 1 
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where 

is minimax and generalized Bayes with respect to the harmonic prior 

7r(a, /3, a') = if3'X'Xf3)-^P-^^^^{a^}-^ (3.1) 

for the entire class of scale mixture of normals. 

Remark 3.1. The coefficient of determination is given by 

,_\\X{X'X)-^X'{y-yl„}r 



R-' = 



l|y-yi^ 



and 



E [\\X{X'X)-^X'{y - yU}f] = a'{^ + p}, 
E[\\y-ylJ']^a'{^ + n-l}, 

where ^ ~ (3X'XP/a^. Hence the smaller i?^ corresponds to the smaller ^. 

Our class of improved estimators utilizes the coefficient of determination i?^ 
in making a (smooth) choice between Sjj (when R^ and ^ are large) and \\y ~ 
2/l„|p/(ri, — 1) (when R^ and £, are small) and reflects the relatively common 
knowledge among statisticians, that Su = RSS/(n — p — 1) overestimates 
when ^ is small. 

Remark 3.2. The estimator 6^ is not the only minimax generalized Bayes 
estimator under scale mixture of normals. In Theorem 2.2, we also provided 
the generalized Bayes estimator with respect to superharmonic prior given 
by 7r(a,/3,CT2) = {f3' X' X f3)-^P-'''>/^{a^}-\ In Maruyama and Strawderman 
(2006), we show that for 5^^ with 2 < a{n,p) < a < p is minimax in the nor- 
mal case with a monotone (j)a^ ■ Hence for a in this range is also minimax 
and generalized Bayes for the entire class of scale mixture of normals. The bound 
a{n,p) has a somewhat complicated form and we omit the details (however, see 
Maruyama and Strawderman (2006) for details). Since and Sjj correspond 
a = 2 to a = p, respectively, we conjecture that with 2 < a < p is minimax. 

Remark 3.3. Under the normality assumption, Maruyama and Strawderman 
(2006) gave a subclass of minimax generalized Bayes estimators with the par- 
ticularly simple form 

s'^B ^{i + c{i-R')y' 



I ^ p — 1 

for < c < c{n,p) where c{n,p) has a slightly complicated form, which we 
omit (see Maruyama and Strawderman (2006) for details). Under spherical sym- 
metry, this estimator is not necessarily derived as generalized Bayes (See the 
following Remark), but is still minimax under scale mixture of normals. 
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Remark 3.4. Interestingly, when (n — l)/2 < p < (n — 1), the generahzed Bayes 
estimator with respect to \\f3'X' X f3\\-P+'^"-'^^/^{a^}-^ is given by 



for the entire class of spherically symmetric distributions (See Maruyama and Strawderman 
(2006) for the technical details). Hence when 



6^^ is minimax and generalized Bayes for the entire class of scale mixture of 
normals. Unfortunately, numerical calculations indicate that, for n in the range 
(25,10,000), the inequality (3.2) is only satisfied for p = (n + l)/2 for n odd 
and n/2 and n/2 + 1 for n even. 

Remark 3.5. For Theorems 3.1 and 3.2, the choice of the loss function is the key. 
Many of the results introduced in Section 1 are given under the quadratic loss 
function {S/a'^ ~ 1)^- Under the Gaussian assumption, the corresponding results 
can be obtained by replacing n + 2 by n. On the other hand, the generalized 
Bayes estimator with respect to Tr{a, /3,a-'^) = Tr{a, I3){(j'^}~^ depends on the 
particular sampling model and hence robustness results do not hold under non- 
Gaussian assumption. 

4. Concluding Remarks 

In this paper, we have studied estimation of the error variance in a general linear 
model with a spherically symmetric error distribution. We have shown, under 
Stein's loss, that separable priors of the form Tr{a, /3){(j'^}~^ have associated 
generalized Bayes estimators which are independent of the form of the (spheri- 
cally symmetric) sampling distribution. We have further exhibited a subclass of 
"superharmonic" priors for which these generalized Bayes estimators dominate 
the usual unbiased and best equivariant estimator, 6u, for the entire class of 
scale mixture of normal error distributions. 

We have previously studied a very similar class of prior distributions in the 
problem of estimating the regression cocfEcients (a, f3) under quadratic loss (See 
Maruyama and Strawderman (2005)). In that study we demonstrated a similar 
double robustness property, to wit, that the generalized Bayes estimators are 
independent of the form of the sampling distribution and that they are minimax 
over the entire class of spherically symmetric distributions. 

The main difference between the classes of priors in the two settings are 
a) in the present study, the prior on is proportional to {cr^}~^ while it is 
proportional to {cr^}° in the earlier study; and b) in this paper, the prior on 
(a, (3) is also separable with a being uniform on the real line and (3 having 
the "superharmonic" form, while in the earlier paper (a, /3) jointly had the 
superharmonic form. 




{2p - n + l)/in - p - 1) < c{n,p), 



(3.2) 
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The difference a) is essential since a prior on proportional to {cr^}^^ gives 
the best equivariant and minimax estimator 6u, while such a restriction is not 
necessary when estimating the regression parameters {a, (3). 

The difference in b) is inessential, and either form of priors on the regression 
parameters (a, /3) will give estimators with the double robustness properties in 
each of the problems studied. The form of the estimators, of course, will be 
somewhat different. In the case of the present paper, the main difference would 
be to replace n — p — 1 hy n — p and to replace by 

{nf + \\XiX'X)-'X'yf}/\\yf. 

As a consequence, the results in these papers suggest that separable priors, 
and in particular the "harmonic" prior given (3.1), are very worthy candidates 
as objective priors in regression problems. They produce generalized Bayes min- 
imax procedures dominating the classical unbiased, best equivariant estimators 
of both regression parameters and scale parameters simultaneously and uni- 
formly over a broad class of spherically symmetric error distributions. 



Appendix A: Proof of Theorem 2.1 

The (generalized) Bayes estimator with Stein's loss is given by {E[l/<T'^\y]}~^. 
Under the improper density 7r(a, /3, cr^) = 7r{a, (3){(t^}~^, the generalized Bayes 
estimator is given by 

// 'm'l{y\a, f3)n{a, l3)dadf3 
JJ m({y\a, /3)7r(a, f3)dadf3 

where m{{y\a,(3) for i = 0, 1 is the conditional marginal density of y with 
respect to {cr^}~^~^ given a and /3, 

Further we have 

poo 

mf(y|a,/3) = ||y-«l„-X/3||-"-2» / f{t)dt 

Jo 



where 



Jo I - - - ) 



fait) = 777^ 



(27r)"/2 

Hence the generalized Bayes estimator is 

JJmUy\a,f3)7T{a,f3)dad(3 _ J,°"f'^/^-^f{t)dt ^'^ fG{t)dt m'^iy) 
JJm{{y\a,(3)n{a,(3)dad(3 j^t^/^f{t)dt J,°" t-f^-^fcmmfiy) 
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where 

G, , fff f ( \\y-cyln-Xpf \ 7r{a,f3) ^ , 

'^^^y^= jJJ ^« ^2 ) (^2)n/2+»+l ^"^^^'^ ■ 

Since e has a sphcricahy symmetric density /(e'e) and E[e\ — 0„ and Var[e] 
In, f as wch as fa satisfies 

n/2 poo 

f{e'e)de = / s"/^-'f{s)ds = 1, (A.l) 

and 



r(n/2) ^0 

/•CXD 

e'ef{e'e)de = '' ^ ^ / s'^/^f{s)ds = n. (A.2 



r(n/2) 



Hence we have 



J^t-/^f{t)dt J^tr^/^-^fcm n 1 

and hence the generahzed Bayes estimator is given by vn!^ [y) j ml {y) which is 
independent of /. 

Appendix B: Proof of Theorem 2.2 

Note, for < a < p, 

9a/2_.p/2 



r({p-a}/2)|X'X|V2 

i,2W2 r „/2-i^[2^_f_^:^\ . 



Then 



2cr2 



a/2-1 |X'X|l/2 / /3'X'X/3\ ^^^2^ ^^'^^ 



{^2}-a/2+l+» (27rcr2)P/2c,P/2 ^ ^ 2cr2g 

where A = {2"/27rP/2}/{r({p - a}/2)|X'X|i/2}. in the foUowing, we calculate 
the integration in (B.2) with respect to a., (3, cr^. and g, in this order. 
By the simple relation 

y - aln ~Xf3= (-a + j/)l„ + v-Xf3 

where y mean the mean of y and v = y — yl„, we have the Pythagorean relation, 

\\y - aln - X/3||2 = n{-a + yf + \\v - X/3f , 
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since X has been already centered. Then we have 



1 .,^JJy--^n-xf3r\,^ 



/ \\v-Xl3P 
■ exp ' 



(27ra2)(«-i)/2 ^ y 2a2 

Next we consider the integration with respect to (3. Note the relation of com- 
pleting squares with respect to /3 

\\v-Xf3\\' + g-'/3'X'X/3 

= (/3 - ^/3) ' X'X (/3 - -^^) + Ml 1,(1 _ ^2) ^ 1} 

5V 1+.9/ V 1+5/1+5 

where $ = {X'X)-^X'v and = || J»£:/3||2/||t;||2 is the coefficient of determi- 
nation. Hence we have 

1 / ||y-al„-X/3||2 



_^J^. (2^a2)»/2 V 2a2 



(27rCT2)p/2gP/2 ^ y 20-2g 



77, 



1/2(1 +g)-p/2 / i|^||2{.g(l-i?2) + l} 



(27r(T2)(»-l)/2 ■^''P V 2o-2(g + l) 

Next we consider integration with respect to . By (B.3), we have 

1 / \\y-aln-Xf3p 

■ exp ' 



_ 2-''/2+»nl/2r({n - a - 1 + 2i}/2) (1 + g)(n-p-a-l+2»)/2 



7r("-l)/2||t;|["-a-l+2« |^(^ _ ^2) _^ ^in-a-l+2i) /2 ' 

Finally we consider integration with respect to g. By (B.4) we have 

C _ 2~°/2+'ni/2r({n - a - 1 + 2i}/2) 

"^j (y) - A ^(„-l)/2||.y||n-a-l+2t 

,a/2-l(l _^ ,)(n-p-a-l+2i)/2 
{g(l - i?2) + i}(n-«-l+2.)/2 '^^^ 

2"°/2+»nV2r({n ^a~l + 2z}/2) 

^(n-l)/2||^||n-a-l+2i(l _ ^2-)(n-p-l-2z)/2 
X r iP/2-a/2-l(i „ i)a/2-l(i _ ^2^)(„-p-a-l+2^)/2 



(B.5) 







The second equality follows from the change of variables 1/{1 + 5(1 — R^)} — >■ t. 
By using the relation (1 — R'^)\\y — ylnp = RSS, m^f {y) / mf (y) is written as 
(2.2). 



Y, Maruyama and W. Strawderman/Improved robust Bayes Estimators 



11 



References 

Brewster, J. F. and Zidek, J. V. (1974). Improving on equivariant estima- 
tors. Ann. Statist. 2 21-38. MR0381098 

James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th 
Berkeley Sympos. Math. Statist, and Proh., Vol. 7 361-379. Univ. California 
Press, Berkeley, Calif. MR0133191 

Maruyama, Y. and Strawderman, W. E. (2005). A new class of general- 
ized Bayes minimax ridge regression estimators. Ann. Statist. 33 1753-1770. 
MR2166561 

Maruyama, Y. and Strawderman, W. E. (2006). A new class of minimax 

generalized Bayes estimators of a normal variance. J. Statist. Plann. Inference 

136 3822-3836. MR2299167 
Stein, C. (1964). Inadmissibility of the usual estimator for the variance of a 

normal distribution with unknown mean. Ann. Inst. Statist. Math. 16 155- 

160. MR0171344 

Strawderman, W. E. (1974). Minimax estimation of powers of the variance 
of a normal population under squared error loss. Ann. Statist. 2 190-198. 
MR0343442 



